AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Preference Learning

# Preference Learning

URM LLaMa 3.1 8B
URM-LLaMa-3.1-8B is an uncertainty-aware reward model designed to enhance the alignment of large language models.
Large Language Model
U
LxzGordon
4,688
10
Llama 3 Base 8B SFT IPO
SimPO is a simple preference optimization method that eliminates the need for reference rewards, aiming to enhance model performance by simplifying the preference optimization process.
Large Language Model Transformers
L
princeton-nlp
1,786
1
Ambersafe
Apache-2.0
AmberSafe is a safety fine-tuned instruction model based on LLM360/AmberChat, belonging to the LLM360 Pebble series, focusing on providing secure text generation capabilities.
Large Language Model Transformers English
A
LLM360
52
7
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase